Goto

Collaborating Authors

 word embedding and topic modeling


Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Neural Information Processing Systems

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transport to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled ground-distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving algorithm convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the topics of admissions, obtaining superior performance on clinically-meaningful disease network construction, mortality prediction as a function of admission codes, and procedure recommendation.


Reviews: Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Neural Information Processing Systems

Summary The authors present a Distilled Wasserstein Learning (DWL) method for simultaneously learning a topic model alongside a word embedding using a model/approach based on the Wasserstein distance applied to elements of finite simplices. This is claimed as the first such method to simultaneously fit topics alongside embeddings. In particular, their embeddings only exploit on document co-occurence rather than nearby co-occurence within a sequence (i.e. using word order information) such as with word2vec. The authors demonstrate the superiority of their embeddings against a variety of benchmarks on three tasks: mortality prediction, admissions-type prediction, and procedure recommendation, using a single corpus of patient admission records where words are the international classification of diseases (ICD) ids of procedures and diseases. There are a number of apparently novel features to their approach which they outline in their paper, namely: * It is a topic model where observed word frequencies within a document are approximated as the *barycentres* (centre of mass) of a weighted sum over a low rank basis of topics (where these barycentres are with respect to some Wasserstein distance).


Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Neural Information Processing Systems

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transport to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled ground-distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving algorithm convergence.